The data which was used in our experiments descripted in the paper [KaDe]:

Due to the continuous updating of data on the NCBI web site, we provide the data that we have used in our research.

################ Taxonomy linked data: ################
* taxdump.tar.gz		-	this file was downloaded on August 6, 2012 from ftp://ftp.ncbi.nih.gov/pub/taxonomy
							It includes 2 files: 
							* nodes.dmp - represents taxonomy nodes 
							* names.dmp - includes taxonomy names

* gi_taxid_nucl.dmp.gz	-	this file was downloaded on December 19, 2012 from ftp://ftp.ncbi.nih.gov/pub/taxonomy
							It contains two columns: the GenBank identifier (gi) and taxonomy identifier (taxid).


################ Reference sequnces: ################
nt.xx.tar		-	xx: from 00 to 12. Compressed data included reference sequences. These files was
				downloaded from ftp://ftp.ncbi.nlm.nih.gov/blast/db/. The update was Jul 27, 2012.


################ Metagenomic sets: ################
* facs_full.fa	-	contains 100,000 reads of an average 269 bp length. 
					These reads were acquired from the FACS web site http://facs.scilifelab.se/, 
					originally used by Stranneheim et al. [Str]. 

* facs_reduced.fa	-	FACS_full set after reduction, which contains 93,653 reads of an average 269 bp length. 

* carma.fa		-	contains 25,000 reads of an average 265 bp length. These reads were acquired from the WebCARMA web site 
				http://wwww.cebitec.uni-bielefeld.de/webcarma.cebitec.uni-bielefeld.de/ and originally used by Gerlach
				and Stoye [GeSt].

* metaphyler.fa	-	contains 66,841 reads of a 300 bp length. It was originally used by Liu et al. [Liu] and contained 73,086
						reads from which some had no information about their origin. 

* phylopythia.fa	-	contains 114,457 unique reads of an average 961 bp length. Originally this set was used by Patil et al. [Pat]
						and contained 124,941 reads from which some were repeated.

MetaPhyler and PhyloPythia sets courtesy of Adam Bazinet, who used them in his paper [BaCu].
Each set contains added tax number (taxid) by us. A more detailed description of data, see the article [KaDe].

HiSeq and MiSeq datasets used by us in the paper [KaDe] were downoladed from https://ccb.jhu.edu/software/kraken website. 
These data were originally used by Wood and Salzberg [WoSa]



################################################################################################################################
[KaDe]	Kawulok J, Deorowicz S, CoMeta: Classification of metagenomes using k-mers, doi: 10.1371/journal.pone.0121453
[GeSt]	Gerlach W, Stoye J (2011) Taxonomic classification of metagenomic shotgun sequences with
		CARMA3. Nucleic acids research 39.
[Str]	Stranneheim H, K�ller M, Allander T, Andersson B, Arvestad L, Lundeberg J (2010) 
		Classification of DNA sequences using Bloom filters. Bioinformatics 26: 1595-1600.
[Liu]	Liu B, Gibbons T, Ghodsi M, Pop M (2010) MetaPhyler: Taxonomic profiling for metagenomic
		sequences. In: Proceedings of the 2010 IEEE International Conference on Bioinformatics and
		Biomedicine, BIBM 2010. pp. 95-100.
[Pat]	Patil KR, Haider P, Pope PB, Turnbaugh PJ, Morrison M, Scheffer T, McHardy AC (2011) Taxonomic metagenome
		sequence assignment with structured output models. Nature Methods 8: 191-192.
[BaCu]	Bazinet A, Cummings M (2012) A comparative evaluation of sequence classification programs.
		BMC Bioinformatics 13: 1-13.
[WoSa]	Wood DE, Salzberg SL: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biology 2014, 15:R46.
